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Abstract. Employing a recent technique which allows the representation of nonstationary data by means 
of a juxtaposition of locally stationary paths of different length, we introduce a comprehensive analysis 
of the key observables in a financial market: the trading volume and the price fluctuations. From the 
segmentation procedure we are able to introduce a quantitative description of statistical features of these 
two quantities, which are often named stylized facts, namely the tails of the distribution of trading volume 
and price fluctuations and a dynamics compatible with the U-shaped proflle of the volume in a trading 
section and the slow decay of the autocorrelation function. The segmentation of the trading volume series 
provides evidence of slow evolution of the fluctuating parameters of each patch, pointing to the mixing 
scenario. Assuming that long-term features are the outcome of a statistical mixture of simple local forms, 
we test and compare different probability density functions to provide the long-term distribution of the 
trading volume, concluding that the log-normal gives the best agreement with the empirical distribution. 
Moreover, the segmentation of the magnitude price fluctuations are quite different from the results for the 
trading volume, indicating that changes in the statistics of price fluctuations occur at a faster scale than 
in the case of trading volume. 

PACS. 05.10.-a Computational methods in statistical physics and nonlinear dynamics - 05.45.Tp Time 
series analysis - 89.65.Gh Economics; econophysics, flnancial markets, business and management 

1 Introduction well [Tl]- This scenario abides by the Wall Street heuristic 

law disseminated by Karpoff 's work that it takes volume 

In the last decades, the description of dynamic and sta- to make price move [H]. 

tistical quantities related to Finance has turned into an In both Physics and Finance, the treatment of non- 
appealing subject for the exploration of physical concepts stationary data is often tackled assuming sets of coupled 
beyond the scope they were originally introduced to [J. stochastic differential equations representing different scales 
Much of the effort has been put upon shedding light on of evolution of the system, which frequently pave the way 
trust- worthy mechanisms leading to the emergence of to demanding solutions [13]. Mixtures of stochastic pro- 
power-law distributions, e.g., for the log-price fluctuations cesses, e.g., compound Poisson processes, can also be con- 
the distribution of which exhibits an asymptotic scale- sidered [S]. However, allowing for the fact that to fit real 
invariant form with slow convergence to the Gaussian, data some of these equations must have large relaxation 
with the Berry-Esseen theorem defining the upper limit times, the modeling of non-stationary quantities can be 
of difference between obtained and expected cummulative efficiently simplified by considering that the system is in 
distribution functions g]. It is well established that the a generic steady state regime and the data are well de- 
changes in the share price are triggered by a myriad of scribed by a juxtaposition of intervals of length £ charac- 
factors (previous price fluctuations, deviations from the terized by few N parameters {tt} [Hj. At the scale £, the 
target price, news, etc.) that make some people willing to parameters are assumed constant, but in the long-term 
buy and some other to sell. Moreover, activity of a finan- follow a certain probability density function. Within this 
cial market is non-stationary IHIllISlEE • To this trait it approach, the length of the local patches £ is systemati- 
was assigned the origin of fat tails in financial observables cally constant as well. This time independence can only 
like the trading volume [Sl I^fTU] . which on its turn would be understood as the first order of the juxtaposition ap- 
imply fat tails in the price fluctuations and on volatility as proach, because it is unlikely that complex systems are so 

well behaved in this respect. Furthermore, it is intuitive to 

" Corresponding author, sdqueiro@gmail.com think that a dynamics for the length of the regions of local 
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stationarity contains valuable information with respect to 
overall features of the observable, e.g., the evolution of the 
correlations. In addition, we can look at the outcome of 
the single scale proposal as a (new) mixture of 'cut and 
pasted' elementary scales that screens the actual statisti- 
cal nature of the local parameters, in the context of what 
was called superstatistics by Beck and Cohen [15]. 

Recently, a work of ours introduced a non-parametric 
segmentation procedure, dubbed Kolmogorov-Smirnov seg- 
mentation (KSS) |16j . aimed at defining quasi-stationary 
segments of varying length in non-stationary time series 
(see • Although the non- uniform segmentation of non- 
stationary time series was not a brand new approach to 
the problem [T7], KSS clearly outperforms previous ap- 
proaches to the problem that use either local moments 
testing or principal component analysis in accuracy or 
fastness [HITIISSI. 

Generically, the hypothesis that the length of the seg- 
ments of local stationarity is not constant sets the scene for 
a real dependence between the values of the local parame- 
ters {tt} and the duration I of the patches. Therefore, the 
long-term distribution of observable O within this frame- 
work is given by 

P{0) = j ... j p(0;M,^) (M;^) p^i) d7T,...dTT,, 

(1) 

where p (O; {tt} , £) represents the conditional probability 
of having a value O given local parameters {tt} in a seg- 
ment of length £. Assuming pe {£) = 5 (£ — A) we get the 
constant £ case. 

After obtaining clear-cut results on heart-rate variabil- 
ity and atmospheric turbulence (16| , we investigate the im- 
pact of the non-stationary nature of financial time series 
in a large set of stylized facts. Specifically, from a thor- 
ough characterization of the features of the trading vol- 
ume at short time scales (1 minute), we pitch at describ- 
ing not only its statistical properties but also at intro- 
ducing a proper representation of price fluctuations from 
statistical properties of trading volume as first endeav- 
oured using daily data [H] and more recently essayed 
in [22] using coupled equations. Concretely, our analy- 
sis focus on the dataset composed of price fluctuations, 
ri{t) = Si{t) — Si{t — 1), and the trading volume, V{t), of 
the 30 blue chip companies defining the Dow Jones Indus- 
trial Average recorded at every 1 minute during the second 
semester of 2004. This corresponds to circa 5 x 10^ data 
points for each quantity of every stock i. For the sake of 
handlcness we have normalized the trading volume of each 
stock by its average value over the span Vi{t) = Vi{t)/Vi. 
The price fluctuations (or returns) were kept as defined. 



2 Heterogeneities in trading volume 

2.1 Statistics of the patches 

As we want to describe established key facts and statistical 
features of financial markets from trading volume, we start 
our analysis by applying the KSS algorithm to this last 



quantity. Despite working nicely without the need for any 
additional constraint, e.g., a lower bound for the size of 
the segments, we curbed the length £ to a minimum of 30 
minutes. This is the time scale describing a first regime 
of the autocorrelation function of the trading volume that 
was found not only for this same data but also for data 
from other markets [10ll23j . It is worth mentioning that 
the introduction of this lower bound does not affect the 
results we present hereinafter, namely the typical length 
of the segments of quasi-stationarity. Q| 

Let us first describe the probability density function 
(PDF) of the duration of the patches. From a first visual 
inspection of Fig. [TJ we noticed there is a well defined 
exponential regime. 



Pe {x > I) = exp 



A 



(2) 



which accounts for more that 95% of the empirical distri- 
bution. Because each firm presents as much as 300 seg- 
ments, the remaining 5% of the empirical complementary 
cumulative distribution function (EDF), which describes 
around 15 segments for each set, is strongly affected by 
the finitcness of the patches set. It is thus tempting to 
^lonsider the change in the behavior of the curve a simple 
artifact. However, we do not have a random modification. 
Instead, we observe a consistent decrease of its absolute 
value for all the stocks and also that the changes come 
to pass at the same length £ ^ 330 minutes. Therefore, 
we reckon there exists a second regime in the length of 
stationary segments, which rules the statistics of patches 
that last longer than a trading session. 

Concentrating our efforts on the significant part of the 
distribution, wc used a log-likelihood adjustment proce- 
dure and from it we consistently found that the EDFs fit 
for Eq. (21) with similar values for all the stocks. The the 
average value, (A), is equal to 116 ± 12 min, when Mi- 
crosoft (MSFT) is set aside ((. . .) stands for averages over 
companies). For Microsoft, we have found a typical scale 
of 230 minutes, which is quite different from the remain- 
ing values even when wc compare it with A ~ 120 minutes 
of Intel (INT), which is also traded at NASDAQ and that 
agrees with (A). The empirical distribution function of £ 
is shown in Fig. [TJ The reader should pay attention to 
the fact that despite we did not remove overnight effects, 
which might affect a financial data analysis, our character- 
istic time scale of local stationarity is significantly smaller 
than the span of a trading session. Moreover, should the 
trading span influence our result, then there would be a 
separation between NYSE and NASDAQ traded stocks, 
which is not the case bearing in mind Intel time scale. 

The next logical step is to verify whether the length of 
the segments are related to one another. This is appraised 
by looking into the behavior of the fluctuations, 



A£, (^) = £,+j - £, 



(3) 



^ It just affects the distribution for small I but the asymp- 
totic behavior is the same. 
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Fig. 1. Complementary cumulative distribution function P {£) 
vs segment length £ for the companies of the DJIA index in a 
In-linear scale. Apart from the deviation in the tail a typi- 
cal exponential decay is apprehended. In the plot, we indicate 
the companies with the shortest characteristic scale, Walmart 
(WMT), and the company with the longest characteristic scale, 
Microsoft (MSFT). 



Already for immediate segments, j ^ 1, we found white 
noise correlations. 



CAi^ - {Ml {i + I) Al, ii)) - {Al^Y = Si,, 



(4) 



However, when we analysed the correlation function of 
|Z\^i|, we verified that it takes a lag around 4 segments to 
attain noise level, which using the value of (A) is close to 
the span of a trading session. 

Considering high-frequency trading volume, it is known 
that markets tend to exhibit high level of activity dur- 
ing the beginning and during the end of the trading ses- 
sions [23]. Therefore, the KSS must yield indications of 
that U-shape profile of the trading volume within a trad- 
ing session, beyond the first indications that the C\ai-^\ 
behavior is alluding to. We first looked for a relation be- 
tween the size of the segments, £, and its average value 
of trading volume, ^e- Although the plots £ versus fi are 
somewhat sprinkled (see Fig. [J), recurring to a standard 
statistical technique of local regression (sec App. [B]), we 
were able to verify that there is an inverse relation £ and 
that goes beyond statistical error due to sample size. This 
result is plausible because it is likely that periods with 
little activity (or small /x) last longer and that periods of 
high activity (or large n) induce changes in the activity 
level more easily so that the local stationarity condition 
is also more easily violated. 

With the goal of understanding how the segments length 
distributes within each intra-day trading hour, we have 
looked at the starting time of each segment of local sta- 
tionarity and afterwards we coarse grained them in such 
a way that the probability of obtaining a given band is al- 
ways equal to 1/8. Accordingly, if the distribution of seg- 
ments was completely uniform along the day, then these 
probabilities would not vary (within error bars). 

Taking into account the stack column bar plot in Fig.[3l 
we noted that there is in fact an intra-day dynamics for 
the conditional fraction of the segments length. First, we 



Fig. 2. Typical dependence of the size of the segment of local 
stationarity, £, vs local average value of the trading volume, /x, 
for General Electric. The line represents the local adjustment 
given by loess algorithm (see App.[B]|. 



apprehended that short and long segments exhibit comple- 
mentary behavior, i.e., longer segments have their higher 
probability of starting during the first hours of a trading 
session, perhaps reflecting trading sessions without much 
ado. After that, it decreases to values smaller than 1/8 
from the second hour of trading onwards, which indicates 
a strong intraday dynamics that moves on into further 
sessions. For the smaller segments, we have almost the 
same probability, which increases as the terminus of the 
session comes up. We relate this behavior to the practice 
of cleaning the order book as the session approaches its 
end. With a similar dependence there is a second group of 
segments of intermediate length, but for which the final 
surging is very pronounced. The distinctive behavior with 
the trading time can be already understood from these 
two analyses. 



§ 0-25- 
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Fig. 3. Averaged stack column bar plot of the conditional 
probability of having a change of local regime which lasts for 
£ minutes averaged over all companies and the frontiers are 
smoothed using a B-spline. The values of the legend represent 
the initial value of each interval grouping. 



Furthermore, to separate out the beginning of the ses- 
sion from the subsequent hours, we appraised to what 
extent segments distribute within the trading session re- 
gardless their length. Figure E] shows that changes of local 
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stationarity occur less in the middle of the session. Com- 
bining the analysis of Figs. [3]|4] we perceive a dynamics 
totally compatible with the aforestated U-shape in the 
trading volume. 

0.20 T ^ , ^ , ^ , ^ , ^ , ^ , 1 




0.05 ^ ' , ' , ' , ' , ' , ' , 1 

1 2 3 4 5 6 

Intra-day time (hour) 

Fig. 4. Averaged probability of having a change of local sta- 
tionarity for a given intra-day time smoothed using a B-spline. 
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Fig. 5. The averaged correlation function of the local mean 
value of the trading volume vs the lag measured in segments. 
The horizontal dashed line represents the noise level. Looking 
to the symbols we verify that (l) reaches the noise level for 
a lag around five days. Interestingly, we notice the intra-day 
signature for the bouncing back of Cp at / = 3 followed by 
other weaker and dwindling rallies at / = 4 intervals until the 
noise level is attained. 



An important point in the mixing scenario is that of 
the slow evolution of the fluctuating parameters that we 
fix within each patch. In Fig. [SI we show the average be- 
havior of the correlation function of sequence of average 
local values of the trading volume, /Lt, values as a function 
of the lag defined in number of segments units. 



C„ 



(5) 



Therein, it is visible that it takes as much as 18 segments 
to have the correlation at the noise level, which correctly 
accommodates in the mixing approachQ Interestingly, we 
noted the existence of a bounce back of the value of at 
I = 3 which is dimly repeated at ? = 4 intervals until noise 
level is reached. A similar curve is found when the auto- 
correlation of the trading volume is analyzed. In other 
words, in segmenting the series using the KSS algorithm, 
we preserved the long-term correlation function that also 
signals the typical scale equal to the duration of a trading 
session which is close to 4 segments of average length. 



2.2 Long-term behavior from the local statistics of 
trading volume 

With the segmentation in hand and a first group of well- 
known properties matching the segmentation results, we 
moved ahead into probabilistic features. Owing to the as- 
sumption that the long-term behavior is the outcome of a 
statistical mixture of simple local forms, we assessed the 
statistical hypothesis that the trading volume is locally de- 
scribed by one of these simple two-parameter PDFs: the 
/^-distribution. 



Pr {v;{cl,,d}) 



■ exp 



(6) 



^ We define the noise level as three times the standard devia- 
tion of the correlation function when the elements are shuffled. 



the log-Normal distribution, 
PIN («;{(/>, 6*}) 



/2 ttOv 

the inverse P-distribution, 



exp 



(Inw — 



2 612 



r 



-w'-^^^exp 



and WeibuU distribution, 

pw {v; {(/), e}) = -^w"^"^ exp 



v\<t'' 



(7) 



(8) 



(9) 



We proceeded as follows: for every stock we have con- 
sidered the segments obtained by the KSS and looked for 
the best local fit for PDFs ©-([H]) by means of optmiz- 
ing the respective log-likelihood function. Subsequently, 
we checked the statistical significance of each fit consider- 
ing the quantity A/fdmax, where (i,„ax is the maximum dis- 
tance between the EDF and the fitting cummulative dis- 
tribution function assuming a Lilliefors approach|f| From 
this procedure, we learnt that the log-normal distribution 
presents the smallest value (v^dmax)i = 0.82 ± 0.06, with 
the other distributions yielding average results greater than 
one ((. . )i stands for average over all the segments of com- 
pany i). 

Alternatively, having applied the Kolmogorov-Smirnov 
statistical distance criterion with an a- value equal to 0.05 
to each segment for each testing distribution, we found an 
average ratio of statistical significance equal to 0.95 ± 0.04 
for the log-Normal. For the remaining test distribution 
we got a statistical significance ratio equal to 0.78 ± 0.06 



We opted by the Lilliefors criterion instead of the stan- 
dard Kolmogorov one in order to check the difference between 
distributions on the left and on the right of each value. 
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for the r-distribution and 0.81 ± 0.04 for the WcibuU 
distributions. Once more, the worst fit is for the inverse 
-T-distribution, which gave 0.41±0.09, i.e., a performance 
ratio below 1/2. The good results of the /^-distribution 
underpin the previous approach of a local Feller process [HI 

mm- 

An individual analysis of the companies also shows 
that the log-Normal tested as the best local distribution 
for all the 30 stocks and the inverse F-distribution the 
worst of the test hypotheses. 

As previously denoted by Eq. ([T]) , the long-term distri- 
bution is the result of a local statistics, p {v; {(p, 9}), that 
is weighed taking into consideration the statistics of <f) and 
0, g {4>,6). Let us start reporting how 9 is distributed. To 
tackle this point, we compared the test distributions by 
computing i/ndmax, where dmax is again the maximal dis- 
tance between the EDF and each of the complementary 
cumulative distribution function after a log-likclihood ad- 
justment and n the number of 9 values in the set, i.e., the 
number of segments we obtained each stock. The averages 
over all the companies and medians of •v/nc'max are. 



(10) 



with, 







(-^/ndinax 


r'-distribution: 


1.05 ±0.58 


0.95 


inverse /^-distribution: 


1.39 ±0.47 


1.35 


log-Normal: 


4.94 ±2.27 


5.09 


Weibull: 


1.13 ±0.54 


1.10 


inverse WcibuU: 


2.15 ±0.54 


2.16 



v(» 



(12) 



/3(» -/?«)' 

where O is the Heaviside function and rj'^^^ = /J'^^ +7/^-' 
(for the sake of conciseness we omitted the dependence 
of fii and uji on and 9). The variable fi represents the 
value at which we obtained a crossover and 77^*?^ is Gaus- 



sian distributed with standard deviation cr„ 



and null 



mean. For all the companies, except 3M, the crossover de- 
pendence Eq. ((ni) was found with (/?)) = 1.23 ±0.88. 
This suggests the existence of a regime for smaller and an- 
other one for larger trading volumes, as a previous scaling 
analyses suggested [13] ■ Regarding the remaining param- 
eters wc obtained the following values above and below 

n, 

a : 0.24 ± 0.08, 0.45 ± 0.05; 

/3 : 0.22 ±0.19, 0.03 ±0.11; (13) 

an : 0.39 ± 0.08, 0.28 ±0.05. 

As regards the median, which is less sensitive to extreme 

values we got: a(^) = {0.23,0.44}, /3(«;) = {0.22,0.02}, 
(Jjji^) = {0.37, 0.26}. Two notes on the relation between 
ui and fii are still worthwhile: first, we tried adjusting the 
scatter plot with a 2nd order polynomial, but the results 
were clearly worse; second, although the dual relation pro- 
vides a better description of the data, a simple power-law 
adjustment fits the points fairly well, as shown by the dot- 
ted line in Fig. [HI 



showing that the distribution which better describes the 
long term behavior is the /^-distribution. Looking more 
attentively at the results, we perceived different behav- 
ior for NYSE and NASDAQ traded stocks. For the for- 
mer, p {6; {7, k}) is best described by Eq. ^ with average 
values 7 = 32.8 ± 4.7 and k = 0.028 ± 0.004 and me- 
dians equal to 32.3 and 0.028, respectively. Then again, 
for Intel and Microsoft, the best fit p {9; k}) is given 
by Eq. Q with similar exponent and scaling parame- 
ters for both the stocks, namely {7 = 3.25, k = 1.26} and 
{7 = 2.86, K — 1.19}. The values of the parameters 7 and 
K of NYSE companies gave on average 6 equal to 0.92 ± 
0.14 and a standard deviation equal to 0.16±0.02 while for 
Intel and Microsoft we have 1.13 ± 0.38 and 1.06 ± 0.40, 
respectively. It is worth remembering that we have nor- 
malized our finite series of trading volume dividing each 
one by its average value. 

The problem of the PDF of </> is very much simplified by 
another empirical finding of ours. In performing a scatter 
plot of the local average, = vi, versus the local variance, 
(jji = v'^i — vf, we perceived a clear dependence between 



these two moments. Setting the scatter plot in a In — In 
scale, Fig. [S] shows that this dependence is close to a dual 
linear relation, 



In 



^/ |a(>Mna;, +?7'(>'| 6»[lnw/-i7] (11) 
+ |a«)lnw/ + r;'«'| ©[/2-lnw/], 



2 
1 

a 
-1 

-2 









f 1 * ' 



-4-20 2 4 6 
In ai; 

Fig. 6. Scatter plot of In /x; vs In l>ji for General Electric. The 
full line represents a numerical adjustment with Eq. pip and 
the dotted line is a simple power-law. 



For a log-Normal distribution defined by the parame- 
ters (/) and 9, the mean and the variance are equal to. 



Pu = exp 



(14) 



and. 



w = (exp [6|2] - 1) exp [2<?!)±6'2] , (15) 

respectively. Using these two equalities, we get for Q ~ 
±00, 



1 - 2q' 



6l2 a In [exp {9''-\ - l] 
y ^ 1 - 2a 



(16) 
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While for the case of the dual-linear relation, it is not 
hard to obtain the equation yielding the crossover param- 
eters (j)c and 9c, 

( rh - a<0>-a>fi< 

<^ (17) 

[20, + el In [exp [el] - 1] = , 



that the length of a segment and the local moments are 
independent and that the relation between In/i and Inw 
is linear. Alternatively, one can appraise out the mixing 
scenario considering a weighed mixture of the n local log- 
Normal distributions defined by local parameters (pi and 
ei, wherein the relative length of the «-th segment, ti/L, 
plays the role of the weight, 



it is very hard to obtain a simple expression analogue to 
Eq. ([TO)), namely (j){e;(f>c,ec)- Moreover, despite the fact 
that the fits using Eq. ([TT|) are better than a simple power- 
law and also that this dual approach also helps verify 
previous results over disparities between small and large 
trading volume, the approach based on Eq. (1111) drags 
in additional complications, particularly when one wants 
to apply fast numerical integration methods such as the 
global adaptive strategy algorithm PT). 

Bringing together all these findings, the long-term dis- 
tribution of trading volume is finally obtained performing 
the integration, 

p(w) = J J piv\cj),e) f {(j),e,e) d<j)dedi 

Fl[^^{c^,e)]g{e) fe{£)p{v\c^,e) 



Piv) 



1 " 



1=1 ^ ' 



exp 



(Inv - 4>i) 
2 02 



(23) 



(L is the length of the time series). As understood in 
Fig. [3 despite the simplifications both approaches already 
yield a good agreement for small, central and large values 
of the trading volume. This is particularly clear for the lat- 
ter case in which we basically do not assume any approx- 
imation. In this case, we also implicitly benefit of using 
information on the finiteness of the data, whereas Eq. ([T^ 
assumes an infinitely long time series and neglects the lo- 
calQ'^rage - segment length relation, which explains the 
better results given by the green dashed curves in Fig. [T] 
d<hde di 



where, 



p{v\^,e) 



2Tre: 



■ exp 



(In V — (j)) 
2^2 



(19) 



expresses the local log-Normal dependence of the trade 
volume. The function 



/3 e^ a In (exp [9^] - l) 
l-2a " y ^ l-2a 



(20) 

embodies the dependence between local average and local 
standard deviation. The function Fi [fi (cj), e)] is a Dirac 
delta functional similar to Eq. ([20)) that allows writing the 
length of a segment, £, as a function of local parameters (j) 
and e via the local value of /i given by Eq. ([T?|) . namely 



Fi[^^{<p,e)] = ^ 









I — h ^exp 


.^+y 


)1 



(21) 



where the function h (x) represents the fit for the loess 
curves £ vs fj, (see, e.g.. Fig. 2) with its argument, fi, sub- 
stituted for primary parameters (f> and e in accordance 
with Eq. (HH). According to what we said, the distribu- 
tion of e is given by. 



6)7- 



exp 



(22) 



and finally {£) is given by Eq. ([2]) . 

Haplessly, the analytical determination of Eq. (fT9|) is 
not possible in this case. In respect of a numerical solu- 
tion we can do it twofold. In the first case, we assume 




Fig. 7. Long-term distribution function P (v) vs v. The points 
the empirical PDF for the trading volumes of General Elec- 
tric. The red dotted line is obtained by numerically integrating 
Eq. p9|) assuming the approximations described in the text 
and the green dashed line was obtained using Eq. ()23|) . The 
upper panel uses a log — log scale and the lower panel uses a 
linear-linear scale. 
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2.3 Probing the relation between trading volume and 
price fluctuations at the mesoscopic scale 

"It takes volume to make price move" [T2|. This asser- 
tion has been consistently quoted and used as the start- 
ing point of attempts to establish a dynamical relation 
between trading volume and price fluctuations [Tni28l[29] . 
While the famous adage is strongly supported and cul- 
tivated by generations of brokers and econometricians, 
particularly those who are interested in futures [30|, the 
evolution to a stronger quantitative approach based on 
high-frequency data analysis, especially the survey of or- 
der books, brought challenging explanations to the actual 
micro-mechanisms leading to the statistics of price fluc- 
tuations }31| . namely the fact that large fluctuations are 
the outcome of differences between ask and bid prices [32 • 
Nevertheless, the feeling that both plummets and signifi- 
cant increases are associated with large volumes remained, 
in part due to the fact that historical slumps (at the daily 
scale) were accompanied by large trading volumes and also 
because the cross-correlation function between price fiuc- 
tuations and trading volume is above noise level. There- 
fore, the natural question is: what is the real impact of 
trading volume on price fluctuations? Since fat tails in 
the distribution of trading volume are mainly the out- 
come of the heterogeneities in the activity (in our case in 
(j) and 9) we can ask a slightly different question based 
on the classical works of Christie [32] and Rogalsky [3^ : 
to what extent are price fiuctuations determined by non- 
stationarity of the trading volume? From a probabilistic 
point of view, the simplest starting point to answer this 
question is to consider Bayes' law. 



(24) 



77 (r) = / p{r\v)P{v)dv. 



For highly liquid blue chip companies and 1 minute sam- 
pling rate, we definitely have P (w = 0) = 0. Nonetheless, 
it is possible that a given volume v yields no price fluctu- 
ation. To characterize the likelihood of this type of event, 
we defined a probability. 



5(0) {v)^l- {v) 



{+) 



.gt-) {v) 



(25) 



where gr*-^) (w) corresponds to the probability of having 
a positive (negative) price fiuctuation for a trading vol- 
ume V. The functions g*-*-* {v) should verify two condi- 
tions: first, scraping events like stock splits and dividends, 
when there is no trading volume the price remains con- 
stant, i.e., (7(i;(^^)(0) = 0; second, for large values of w, it 
most surely approaches a value independent of the trad- 
ing volume, which is not necessarily equal for negative and 
positive price fluctuations, as verifiable in Fig.|Sl For these 
reasons, we assumed that g^^^ [v) is fairly described by, 



c,(±) (v) = Gtanh (n7 w^) 



(26) 



Averaging over all the companies we have, 

G w P 

g(-) : 0.4 ± 0.03 2.56 ± 0.87 0.3 ±0.1 (27) 
: 0.47 ± 0.04 1.22 ± 0.29 0.25 ± 0.05. 



These values, followed by a visual inspection of Fig. [SJ 
point that for small trading volume values the probability 
of having a negative value is higher than the probabil- 
ity of a positive value with the relation between g^~^ and 
g^") changing for v 1. At first glance and taking into 
consideration the risk-aversion ethos of financial agents, 
we would expect precisely the opposite, i.e., small trad- 
ing values dominating price rises and large trading val- 
ues associated with price decreases. However, minding the 
covariance ((r — {r)){v — (w))), we understand that this 
behavior corresponds to a high-frequency verification of 
Ying's findings [3S] about the existence of a correlation 
between price fluctuations and trading volume thus pro- 
viding some quantitative support to the adage. 
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Fig. 8. Probability of having a positive (negative) price fluctu- 
ation vs trading volume. The points were obtained for data of 
General Electric and the lines are the best fits using Eq. (|26p. 



As regards trading volume associated with non-zero 
price fluctuations, it is worth appraising the relation be- 
tween volume and the price fiuctuations. 



ktl = ( kl I '"t) +vt 

= I(u) + 1], 



(28) 



where ( |r| | v) represents the expected magnitude of the 
price fluctuation produced by trading volume v. We have 
represented this deterministic part of the relation between 
the price fluctuations and volume by I (v) dubbing it trad- 
ing impact. Although the term impact has been introduced 
in the context of order book analysis [3B] , it has been em- 
ployed in longer spells in which accumulated (meta) orders 
are considered [37]. At odds with first proposals that as- 
sumed a linear relation between the price difference and 
trading volume [38], S' ~ S = X^^ v (with A being the 
market depth), later approaches backed up by empirical 
analysis have proposed that the long term I (v) is well 
described by either power-law, \rt\ , or logarithmic, 
|rt| ~ logtij, forms for the Paris and London stock markets 
in the tick- by-tick [3^135] and 30 minute scales [ID] . 

In what follows, we tested the homogeneity of such pro- 
posals, i.e., we aimed at finding whether the changes in 
the local features of trading volume would impinge over 
its relation to the price fiuctuation. Wc carried out this 
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approach twofold: wc tried to find a relation between the 
parameters describing the form of the trading impact func- 
tion and the size of the segments. Along these lines, we 
have tested three different forms, 



i) I (v) = a + blnvt, 

ii) In X (w) = a + blnvt, 

iii) In I (v) = a + bvt, 



(29) 



where we have considered different versions of each one 
for positive and negative returns. The power-law and loga- 
rithmic test functions are inspired by the previous work on 
impact functions and the exponential, case iii) in Eq. (|29|) . 
is introduced because in complex systems it is ubiquitous 
the emergence of power-laws from a mixture of exponen- 
tial functionals with different characteristic parameters. 
Since scatter plots of the price fluctuations with respect 
to the trading volume are rather noisy at a local scale, 
we resorted once more to the loess regression technique 
to describe cleaner curves. Afterwards, we adjusted these 
curves using the expressions in Eq. and compared the 
per degree of freedom values in order to appraise which 
of the forms is the best. The average values for negative 
and positive returns are the following, 



(negative) (positive) 

i) 4 X 10"'' ± 0.002 9 X 10-5 ± 3 X lO"" 

ii) 0.006 ±0.003 0.006 ±0.003 

iii) 0.045 ±0.033 0.042 ±0.017. 



(30) 



Setting our sights on the best approach (smaller val- 
ues), i.e., the logarithmic fit in Eq. (|29l) . we have for the 
negative returns a = 0.056 ± 0.021 and b = 0.0062 ± 0.004 
and for positive returns a = 0.054±0.015 and b = 0.0059± 
0.002. In other words, wc have not found a significant dif- 
ference between positive and negative price fluctuations 
in respect of the trading impact. In addition, further sta- 
tistical analysis of a and b showed that their distributions 
are significantly peaked. 

Keeping our focus on the heterogeneities of the data, 
wc further analyzed whether there is a relation between 
the parameters a, b and the size of the segments, £ (see 
Fig. [5]). Considering the linear adjustment of a{£) and 
b {£) for positive and negative price fluctuations, we ver- 
ified they hardly vary with the segment length yielding 
median slopes equal to Sq = {—4.6 x 10"^, —4.3 x lO"^} 
and Sb = {-7.7 x 10-^,2.8 x 10"^} with the same be- 
havior verified using local regression. Bearing in mind the 
magnitude of these slopes we assert that the trading im- 
pact functions are homogeneous. These two findings are 
represented in Fig. [Sj 

Let us finally introduce a simple argument which aims 
at explaining the expected relation between return and 
volume. First, we describe the simple case wherein a trad- 
ing volume does not change the price. In this case, we have 
in the long-term. 



77(r = 0) = 



1-5^°) iv) 




Fig. 9. Left: value of parameter a in test i) of Eq. (|29p vs 
length of the segment £ (upper panel) and EDF of detrended 
and normalized a (lower panel) for each patch. Right: the same 
but for parameter b. In the lower panels the green lines repre- 
sent the complementary cumulative distribution function of the 
Normal distribution showing that both a and b are not Gaus- 
sian distributed in the long term. These data are for General 
Electric. 



where P (v) is the long-term distribution Eq. ([T9| [or Eq. ((23 
in practical applications]. With respect to non-zero re- 
turns, the conditional distribution has got a different form, 
namely, 

p{\r\\v) = f{\r\\ v) [g^+Hv) + , (32) 

where /(l^j | v) is the double conditional probability of 
having a return of magnitude |r| given a trading volume 
V that produces a non-zero price fluctuation|f| Allowing 
for Eq. (|28p and assuming that the error in the numerical 
adjustment, rjt, follows a Gaussian distribution. 



have0 



/2 TT (T, 



exp 



iv-{v)f 



/(|r| I v)^g{\r\-{a + b\nv,a^}), 
and thus finally for |r| 7^ we get. 



(33) 



(34) 



i7(|r|)= / Mr{a + b\nv,(Jr,) (w) + <7^"' («)) P {-v) dv. 



(35) 



P{v) dv, 



(31) 



In the last definitions we scrapped the distinction between 
positive and negative returns for the sake of simplicity. 

^ Hereafter, we utilize the approximately equal signal be- 
cause / (|r-| \ v) can only have a truncated form, which is used 
to several problems, take into account that |r| > 0. 
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Fig. 10. Probability of the magnitude of price fluctuations, 
n (|r|) , vs magnitude of the price fluctuations, \r\ for General 
Electric. The red line was obtained by the (numerical) integra- 
tion of Eq. p5|l . The plot shows a factor 10 between data and 
the testing hypothesis. 



Plugging Eq. (^5]) into Eq. we are able to verify 
that although there is a relation between price fluctua- 
tions and trading volume, it only yields a fair represen- 
tation of the peak of the distribution, which decays more 
slowly than the PDF from price - volume arguments (see 
Fig. [To)) . Taking into consideration that the peak of a dis- 
tribution concentrates the key part of the measure, we can 
state that the heuristic adage relating trading volume and 
price fluctuations is in some sense verifled. However, it 
completely fails at describing the stylized fact concerning 
the slow decay of the distribution 77 (|r|), as Fig.lTUlclearlv 
shows. So, what are the reasons for such misflre? Within 
the context of the heterogeneous approach, we can un- 
derstand the different behavior of price fluctuations with 
respect to trading volume in applying the KSS algorithm. 
Looking at the results we present in Fig. 1111 we found 
that the length of the segments of local stationarity in |r| 
still follows a exponential like Eq. but with a much 
short typical scale than the segmentation of trading vol- 
ume, namely r — 77±15 minutes. This proves the different 
dynamics of both quantities, particularly the respective 
degree of non-stationarity. We still must remember that in 
the present approach, we wiped out factors like the fluc- 
tuations of the parameters of the local impact functions 
that can be regarded as a proxy for the local volatility. 
Accordingly, using a very different methodology our re- 
sults go along the conclusion that large price fluctuations 
are more about the volatility than the volume |29l[4T| . We 
shall back to this point in the Discussion. 



3 Mixing description of price fluctuations 

Having verifled that trading volume is not a relevant fac- 
tor leading to fat tails in the distribution of price fluctu- 
ations, we resort to the KSS in order to milk some fur- 
ther information on the impact of the non-stationarity of 
the returns in their local and long-term statistical proper- 
ties. Traditionally, the volatility has been justifled by the 
impact of trading orders, but recent results at the order 
book level as well as the results of our previous section 




400 600 
a (minute) 



1000 



Fig. 11. Cummulative distribution of the segments of the seg- 
mentation of the absolute values of the price fluctuations. Con- 
trarily to Fig. [TJ it is visible a clear exponential decay and the 
average over characteristic times yields 77 ± 15 minutes. The 
qualitative behavior among stock is also different. In this case 
the less non-stationary series is United Technologies (UTX) 
whereas the most non-stationary is Pfizer (PFE). 



have shown that volatility actually reflects a raft of other 
things, e.g., the random component in our trading impact 
among others. In our framework, the first thing to do is 
to classify the statistical nature of the local standard de- 
viation. We assume as local volatility the variance of the 
corresponding segment resulting from the segmentation of 
the price fluctuations. Applying the same statistical pro- 
cedures of Sec. 12.21 we verified that the best global dis- 
tribution of the squared volatility (local variance) is given 
by the inverse-Gamma distribution of Eq. ([5]) with aver- 
age parameters (j) = 2.5 ± 0.7 and = (4 ± 1) x 10""^. For 
some companies the Gamma distribution gave significant 
results as well0 The observation of an inverse-Gamma dis- 
tribution for the squared volatility is in accord with pre- 
vious studies [32] but it concurs with previous theoretical 
approaches aimed at justifying the use of the Student-t 
(or q-Gaussian). However, this is just part of the story, 
to get such a long-term distribution we still need to give 
statistical evidence that the price fluctuations are locally 
Gaussian. Against the odds, we found that the local dis- 
tribution is best locally described by an exponential dis- 
tribution. 



p(r; fj.,(T) = ^ cxp 
2 a 



\r- fi\ 



(36) 



for which the local average is equal to /x and the local vari- 
ance is equal to S = 2a'^. Once again by employing the 
mixing indicated by Eq. ^ we obtain the long term dis- 
tribution of the price fluctuations as presented in Fig. 1121 
In place of looking for full integration, we can simplify 
the calculation of P{r) noticing that the key deviation 
from the local exponential distribution comes from the 
large values of the volatility. When — >■ oo its distribution 
decays as Z'"^"'^. Using the asymptotic behavior in the in- 
tegration rather than its full form we get P{r) ^ |r|~^~^ 



^ These stocks are: American Express, Boeing, IBM, JP Mor- 
gan, Walmart. 
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The plots in different types of scales clearly show that 
the local Gaussian distribution does not allow a good rep- 
resentation of the long-term distribution P{r) both in the 
central part and the tails. In addition, we verified that 
P(r) decays almost as an exponential, which is compati- 
ble with the large asymptotic exponent we obtained after 
using the values of found by fitting the local variance. 

As a complement, we applied for each company the 
t-Student test [35] to verify whether the distributions of 
the local means of the returns were compatible with a zero 
mean normal distribution. The p- values obtained were p < 
0.1 for most companies; two companies with large p, namely 
Du Pont and McDonald's have p = 0.39 and p = 0.33, 
respectively; and other five companies (Caterpillar, IBM, 
Johnson & Johnson, Altria and United Technologies) have 
0.1 < p < 0.2. Concerning the skewness and the kurtosis of 
the distributions, the Jarque-Bera [U] test showed a very 
good agreement with a normal distribution, providing for 
all companies p < 0.001. 
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Fig. 12. Long-term distribution of the price fluctuations in log- 
linear and linear-linear scales (upper and lower panels, respec- 
tively). The points correspond to the empirical PDF of General 
Electric. The green dashed line and the magenta dotted line 
are the long-term distribution obtained from the segmentation 
procedure considering an exponential and a Gaussian distribu- 
tion. The parameters of the inverse-Gamma distribution of the 
squared volatility are </!) — 4.0 and Q = 6.3 x 10^'^. 



4 Discussion 

In this work we inspected the effectiveness of the statisti- 
cal mixture approach in mimicking primal financial quan- 
tities: price fluctuations and trading volume. We proved 



that the proper segmentation of the time series, which con- 
siders segments of varying length is fundamental for a cor- 
rect description of statistical and dynamical stylized facts. 
We did it employing a non-parametric method of segmen- 
tation, the KSS [T^, which assumes that differences be- 
tween segments can derive from discrepancies in any sta- 
tistical moment, which define the characteristic function 
of a probability density function. 

Considering the trading volume, we reinforced the idea 
that a mixture of distributions is able to nicely describe 
its long term PDF. Specifically, from our results, the long- 
term PDF is effectually described by the statistical mix- 
ing of juxtaposed stationary segments of unequal length 
wherein the trading volume is log-Normal distributed. The 
distribution of the length of these segments is dominated 
by a exponential form with a typical scale around 115 min- 
utes, which decays much slowly for lengths greater than 
330 minutes. This last regime mainly represents the be- 
havior of patches of stationarity that last longer than a 
trading session. Bearing in mind stochastic mechanisms 
related to the log-normal distribution, we can think about 
this functional form of the trading volume as the result 
of a cascade of transactions, v(i) = nr=i-^''' '^ith InT 
representing the log of the size of the r-th trade that is 
Gaussian distributed with average equal to [i and standard 
deviation equal t 0. 

Locally, we also verified that there is an intrinsic re- 
lation between the local average and the variance, i.e., 
between /i and Q. With this relation, we were able to sta- 
tistically express the local behavior in terms of a single 
local Log-Normal parameter \Q in Eq. ^\ that we learnt 
being well described by a Gamma distribution. In addi- 
tion, we noted that segments with large local averages 
and small local averages behave differently and beyond 
statistical effects, albeit the description considering a sin- 
gle behavior also gives good results. These results are to be 
compared with the simpler approach of segments of con- 
stant length. At the probabilistic level, the former case 
gives a local distribution compatible with a Gamma dis- 
tribution, which provides a fair local description of the 
data, with a 15% handicap though. Notwithstanding, cru- 
cial differences arise in the description of the data: first, 
there is an important relation between the local average 
and variance that would not be learnt were we using fixed 
length segments or even applying a segmentation method 
based on the analysis of the means; Second, and most im- 
portantly, it would be impossible to capture and identify 
important dynamical stylized facts such as the U-shape 
of trading volume within each session; the slow decay of 
the autocorrelation function via the correlations of the 
average value of trading volume in juxtaposed stationary 
patches as well as the relation between the magnitude of 
the fluctuation of the segments length which are signifl- 
cantly correlated within the trading session span. 

Stemming from such a good description of trading vol- 
ume, we were able to shed light on the recent dispute be- 
tween partisans of the famous relation between the trad- 
ing volume and price fluctuations popularized by Karpoff 
in |12j and new quantitative results that assign a minor 
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role to volume in the dynamics of price fluctuations. Our 
results indicate that each assertion has its own domain 
of validity. On the one hand, we corroborated the claim 
conveyed in [5^15^ that trading volume is not a key in- 
gredient in large price fluctuations. On the other hand, 
holding on the statistical properties of trading volume we 
were capable of obtaining a fair representation of the cen- 
tral part of the distribution of the magnitude of price 
fluctuations. This finding agrees with the results of the 
cross-correlation between trading volume and price fluctu- 
ations [Tl , 28,35. , which are traditionally used as the main 
argument to defend the intimate relation between price 
and volume. However, our results clearly show that this 
cross-correlation (not greater than 20%) basically concen- 
trates on small price fluctuations. Within our framework, 
the most straightforward explanation for the short-coming 
of the return — volume relation is given by the segmen- 
tation of the magnitude of the price fluctuations, the re- 
sults of which are quite different from those we obtained 
for the trading volume. Explicitly, for the magnitude of 
price fluctuations we got a very clear exponential decay 
of the distribution of segments of local stationarity with- 
out the slower tail exhibited by the distribution trading 
volume segments. Quantitatively we found a typical scale 
around 75 minutes, which is substantially smaller than the 
115 minutes found in the segmentation of trading volume. 
Since the length of the stationary segments acts as a sim- 
ple, yet effective, way of quantifying the extension of the 
non-stationary nature of a time series, we can understand 
that changes in the impact of the volume or even in the 
probability of having non-zero price fluctuations occur at 
a faster scale that is not captured in the trading volume 
scale, leading to a faster decay of n{\r\). The effects we 
have just mentioned can be combined and represented by 
the volatility. Thence, working at a different scale our re- 
sults prop up the statement that "there's more to volatil- 
ity than volume" |41j . Complementarily, we might also 
say that the adage about the volume being responsible for 
the price changes can be accepted in the same way Black- 
Scholes equation is valid in option pricing: it gives a fair 
forecast during a good part of the time, but it completely 
drops the clanger in the cases wherein one can make (lose) 
big money. 



model and 6^ It prompts the study of different definitions 
of noise in ARCH- like processes [47] . 

After bearing good fruit at the minute scale of stock 
trading, this method can be put to use in other finan- 
cial problems at order book scale and enhance reasoning 
about other financial products. Concerning the former it 
would be interesting to analyze which additional features 
could be captured in the dynamical properties of individ- 
ual agents previously studied by a comparison of the local 
means 08] . We should underscore that in the case of atmo- 
spheric turbulence [111 , the test of the means is useless in 
the evaluation of local stationarity. In respect of other fi- 
nancial products, we can mention the dynamics of futures 
and other derivatives. 
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A KS-segmentation 

Considering that a nonstationary time series can be split 
into stationary segments, the aim of segmentation is to 
find the optimal positions to separate the time series in 
such segments. In KSS, these positions are obtained by 
finding, along the series, the maximal 



D = DKsil/riL + l/riR) 



-1/2 



(37) 



where {np) is the number of points to the left (right) of 
the hypothetical cutting point and Dks is the Komogorov- 
Smirnov distance between the complementary cumulative 
distributions of these two samples. Once we find the po- 
sition of maximal distance D"^"-^ , we test the statistical 
significance (at a chosen significance level a = 1 — Pq ) of a 
potentially relevant cut at that point. That is achieved by 
comparison with the value of D that would be obtained 
was the sequence random. The critical value is given by 
the phcnomenological expression |16) 



DT;i^{n)=a{\nn-bY 



(38) 



Finally, we verified that the (squared) volatility of price {a,h,c) = (1.41, 1.74, 0.15), (1.52, 1.80, 0.14), and (1.72, 
fluctuations evolves as an inverse- Gamma distribution, whichl.86, 0.13) for Pq = 0.90,0.95,0.99, respectively. If D"*"^ 
perfectly tallies with the mixture distribution hypothesis 
that assumes price fluctuations are locally Gaussian dis- 
tributed and which is the cornerstone of Engle's ARCH 
model [35]. Regardless, when we looked into the local dis- 
tribution of price fluctuations we concluded that they do 
not follow a Gaussian distribution, but an exponential dis- 
tribution instead. Were the local distribution Gaussian, 
we would have had a long term empirical distribution well 
described by a Student- i ((/-Gaussian), which is not the 
case for both the central part and the tails. This result is 
interesting twofold: a) although the volatility process does 
not fit that of the Heston model [IHl, the local exponen- 
tial we found agrees with the short term behavior of this 



exceeds the critical value for the selected significance level 
Z)™"^ (n), then the cut is done. The procedure is then 
recursively applied starting from the full series, until no 
segmentablc patches are left. See [TB] for further details. 



B Loess 

In order to have a smooth set of points from scattered data 
(xi, yi), i = 1, . . . , n, we apply the robust locally weighted 
regression (loess) [49] to obtain the estimated values for 
each point. The procedure consists of two parts. First, 
the weight function depends on the distance to the r-th 
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nearest neighbor and a weighted least-squares fitting pro- 
cedure gives the estimated values for each point. Second, 
a new factor is introduced in the weighting computation, 
based on the residuals of the first fitting procedure, im- 
proving the weights in the sense that large residuals will 
have small weights and small residuals will have large ones. 

We can summarize the loess procedure as follows: for 
each point i, we compute the distance hi from Xi to its r-th 
nearest neighbor. The k = 1, . . . , n, (with k ^ i) weights 
for each point xi will be given by 

M^^) = W (^^) , (39) 
where W is the tricubic weight function 

r(i-N3)3, 1:^1 <i 
w = < 

[ 0, > 1. 

Then, in our cases, a linear least-squares fitting with weights 
given by Eq. (p9)) determines the estimated yi that corre- 
sponds to Xi and its residual, Ci = yi — iji. A different set 
of weights, Sk = W{ek/{6s)), is defined for each {xi,yi) 
based on the size of e^, and s is the median of |ei|. The new 
estimated values are obtained as before but with ujk(xi) 
replaced by dkUJk{xi). This calculation of Sk is iterated as 
much as necessary to have a satisfactory smoothed curve, 
for example when s stabilizes. Further discussions and ex- 
amples are presented in |49| . 
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